The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Optimizations, including tiling, often target a single level of memory or parallelism, such as cache. These optimizations usually operate on a level-by-level basis, guided by a cost function parameterized by features of that single level. The benefit of optimizations guided by these one-level cost functions decreases as architectures tend towards a hierarchy of memory and of parallelism. We have identified...
This paper applies unimodular transformations and tiling to improve the data locality of a loop nest. Due to data dependences and reuse information, not all loops will and can be tiled. Therefore, the approach proposed in this paper attempts to capture as much data reuse in the cache as possible while tiling as few loops as possible. By using cones to represent the data dependences and vector spaces...
This paper presents some compilation techniques to compress holes. Holes are the memory locations mapped by useless template cells and are caused by the non-unit alignment stride in a two-level dataprocessor mapping. In a two-level data-processor mapping, there is a repeated pattern for array elements mapped onto processors. We classify blocks into classes and use a class table to record the attributes...
Data parallel languages like High Performance Fortran, demand efficient compile and run-time techniques for tasks such as address generation. Array references with arbitrary affine subscripts can make the task of compilers for such languages highly involved. This paper deals with the efficient address generation in programs with array references having two types of commonly encountered affine references,...
The data distribution problem is very complex, because it involves trade-off decisions between minimizing communication and maximizing parallelism. A common approach towards solving this problem is to break the data mapping into two stages: an alignment stage and a distribution stage. The alignment stage attempts to increase parallelism, while the distribution stage attempts to decrease communication...
Highly parallel computers have the memory capacity and potential speed to perform very high-resolution time-dependent calculations. Parallel computers with hundreds of fast processors require highly scalable algorithms to avoid wasting expensive resources. On these machines careful attention must be given to program design to fully exploit scalable algorithms. We have proposed a programming model...
Static Single Assignment (SSA) form has shown its usefulness as a program representation for code optimization techniques in sequential programs. We introduce the Concurrent Static Single Assignment (CSSA) form to represent explicitly parallel programs with interleaving semantics and post-wait synchronization. The parallel construct considered in this paper is cobegin/coend. A new confluence function,...
Pointer analysis is essential for optimizing and parallelizing compilers. It examines pointer assignment statements and estimates pointer-induced aliases among pointer variables or possible shapes of dynamic recursive data structures. However, previously proposed techniques perform pointer analysis without the knowledge of traversal patterns of dynamic recursive data structures to be constructed....
This paper presents some compiler and program transformation techniques for concurrent multithreaded architectures, in particular the superthreaded architecture [9], which adopts a thread pipelining execution model that allows threads with data dependences and control dependences to be executed in parallel. In this paper, we identify several important program analysis and transformation techniques...
This paper proposes solutions to two important problems with parallel programming environments that were not previously addressed. The first issue is that current compilers are typically black-box tools with which the user has little interaction. Information gathered by the compiler, although potentially very meaningful for the user, is often inaccessible or hard to decipher. Second, compilation and...
The only way for parallelizing compilers to exploit potential parallelism of loops in which dependence information is inadequate statically is using run-time loop parallelization technique. There are two approaches in this field: the inspector-executor method [17] and the speculative DOALL test [13]. For the former approach, there always incurs heavy preprocessing overhead during inspector phase and...
Handling the procedure interface in an HPF compiler is complex due to the many possible combinations of Fortran 90/HPF properties of an actual array argument and its associated dummy argument. This paper describes an algorithm that reduces this complexity by mapping all the combinations of properties to a small set of canonical Internal Representations. These internal representations as well as the...
This paper describes an ongoing effort supported by ARPA PCRC (Parallel Compiler Runtime Consortium) project. In particular, we discuess the design and implementation of an HPF compilation system based on PCRC runtime. The approaches to issues such as directive analysis and communication detection are discussed in detail. The discussion includes fragments of code generated by the compiler.
Many large-scale computational applications contain irregular data access patterns related to unstructured problem domains. Examples include finite element methods, computational fluid dynamics, and molecular dynamics codes. Such codes are difficult to parallelize efficiently with current HPF compilers. However, most of these problems exhibit spatial locality. This property is exploited by our approach...
This extended abstract motivates and briefly describes a strategy for computing symbolic constraints on values of integer variables and using them to simplify the control flow of compiler-generated parallel programs. This strategy has been implemented and evaluated in context of the Rice dHPF compiler for High Performance Fortran.
We present an efficient array data flow analysis based global communication optimizer which manages the analysis cost by partitioning the data flow problems into subproblems and solving the subproblems one at a time in a demand driven manner. In comparison to traditional array data flow based techniques, our scheme greatly reduces the memory requirement and manages the analysis time more effectively...
In this paper, we consider the problem of generating efficient, portable communication in compilers for parallel languages. We introduce the Ironman abstraction, which separates data transfer from its implementing communication paradigm. This is done by annotating the compiler-generated code with legal ranges for data transfer in the form of calls to the Ironman library. On each target platform, these...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.